We investigate the sample complexity of learning the optimal arm for multi-task bandit problems. Arms consist of two components: one that is shared across tasks (that we call representation) and one that is task-specific (that we call predictor). The objective is to learn the optimal (representation, predictor)-pair for each task, under the assumption that the optimal representation is common to all tasks. Within this framework, efficient learning algorithms should transfer knowledge across tasks. We consider the best-arm identification problem for a fixed confidence, where, in each round, the learner actively selects both a task, and an arm, and observes the corresponding reward. We derive instance-specific sample complexity lower bounds satisfied by any $(\delta_G,\delta_H)$-PAC algorithm (such an algorithm identifies the best representation with probability at least $1-\delta_G$, and the best predictor for a task with probability at least $1-\delta_H$). We devise an algorithm OSRL-SC whose sample complexity approaches the lower bound, and scales at most as $H(G\log(1/\delta_G)+ X\log(1/\delta_H))$, with $X,G,H$ being, respectively, the number of tasks, representations and predictors. By comparison, this scaling is significantly better than the classical best-arm identification algorithm that scales as $HGX\log(1/\delta)$.
translated by 谷歌翻译
In recent years, there has been a growing interest in the effects of data poisoning attacks on data-driven control methods. Poisoning attacks are well-known to the Machine Learning community, which, however, make use of assumptions, such as cross-sample independence, that in general do not hold for linear dynamical systems. Consequently, these systems require different attack and detection methods than those developed for supervised learning problems in the i.i.d.\ setting. Since most data-driven control algorithms make use of the least-squares estimator, we study how poisoning impacts the least-squares estimate through the lens of statistical testing, and question in what way data poisoning attacks can be detected. We establish under which conditions the set of models compatible with the data includes the true model of the system, and we analyze different poisoning strategies for the attacker. On the basis of the arguments hereby presented, we propose a stealthy data poisoning attack on the least-squares estimator that can escape classical statistical tests, and conclude by showing the efficiency of the proposed attack.
translated by 谷歌翻译
我们在生成模型下研究了固定置信度设置中的折扣线性马尔可夫决策过程中最佳政策识别的问题。我们首先在实例特定的下限上获得了识别$ \ varepsilon $ - 最佳策略所需的预期数量,并具有概率$ 1- \ delta $。下边界将最佳采样规则表征为复杂的非凸优化程序的解决方案,但可以用作设计简单而近乎最佳的采样规则和算法的起点。我们设计了这样的算法。其中之一展示了样本复杂性上限,由$ {\ cal o}({\ frac {d} {(\ varepsilon+\ delta)^2}}}}(\ log(\ frac {1} {\ delta} {\ delta})+d d d}} ))$,其中$ \ delta $表示次优的动作的最小奖励差距和$ d $是功能空间的尺寸。该上限处于中等信心状态(即,对于所有$ \ delta $),并与现有的minimax和Gap依赖的下限匹配。我们将算法扩展到情节线性MDP。
translated by 谷歌翻译
在切成薄片的网络中,切片的共享租约需要基于网络资源的测量来对数据流进行自适应录取控制。在本文中,我们研究了基于测量的入学控制方案的设计,决定是否可以接收新的数据流,在这种情况下,在哪个切片上。目的是设计一个联合度量和决策策略,该策略以一定程度的信心返回正确的决策(例如,负载最少的切片),同时最大程度地减少测量成本(在做出决定之前进行的测量数量)。我们研究了一些自然入学标准的此类策略的设计,以指定正确的决定是什么。对于这些标准中的每一个,使用匪徒中最佳手臂识别的工具,我们首先根据任何算法的成本来得出明确的信息理论下限,以固定的信心返回正确的决策。然后,我们制定了达到这一理论限制的联合测量和决策策略。我们从经验上比较了这些策略的测量成本,并将其与下界和幼稚的测量方案进行了比较。我们发现我们的算法明显优于天真的计划(因子$ 2-8 $)。
translated by 谷歌翻译
控制蜂窝网络中的天线倾斜必须在网络覆盖和容量之间达到有效的权衡。在本文中,我们设计了从现有数据(在所谓的被动学习设置中)的算法最佳倾斜控制策略或由算法主动生成的数据(活动学习设置)。我们将这种算法的设计形式形式线性多臂杆(CL-MAb)中的最佳策略识别(BPI)问题。一个手臂代表天线倾斜更新;上下文捕获当前的网络条件;奖励对应于改善性能,混合覆盖和容量;目标是识别,具有给定的置信度,一个大约最佳的政策(将上下文映射到具有最大奖励的手臂的函数。对于CL-MAB在主动和被动学习设置中,我们在任何算法返回近似最佳策略所需的样本数量上获得信息 - 理论下限,以及实现这些基本限制的设计算法。我们将我们的算法应用于蜂窝网络中的远程电气倾斜(RET)优化问题,并显示它们可以使用比天真或现有的规则的学习算法更少的数据采样产生最佳倾斜更新策略。
translated by 谷歌翻译
Real-world robotic grasping can be done robustly if a complete 3D Point Cloud Data (PCD) of an object is available. However, in practice, PCDs are often incomplete when objects are viewed from few and sparse viewpoints before the grasping action, leading to the generation of wrong or inaccurate grasp poses. We propose a novel grasping strategy, named 3DSGrasp, that predicts the missing geometry from the partial PCD to produce reliable grasp poses. Our proposed PCD completion network is a Transformer-based encoder-decoder network with an Offset-Attention layer. Our network is inherently invariant to the object pose and point's permutation, which generates PCDs that are geometrically consistent and completed properly. Experiments on a wide range of partial PCD show that 3DSGrasp outperforms the best state-of-the-art method on PCD completion tasks and largely improves the grasping success rate in real-world scenarios. The code and dataset will be made available upon acceptance.
translated by 谷歌翻译
Model estimates obtained from traditional subspace identification methods may be subject to significant variance. This elevated variance is aggravated in the cases of large models or of a limited sample size. Common solutions to reduce the effect of variance are regularized estimators, shrinkage estimators and Bayesian estimation. In the current work we investigate the latter two solutions, which have not yet been applied to subspace identification. Our experimental results show that our proposed estimators may reduce the estimation risk up to $40\%$ of that of traditional subspace methods.
translated by 谷歌翻译
This report summarizes the work carried out by the authors during the Twelfth Montreal Industrial Problem Solving Workshop, held at Universit\'e de Montr\'eal in August 2022. The team tackled a problem submitted by CBC/Radio-Canada on the theme of Automatic Text Simplification (ATS).
translated by 谷歌翻译
Counterfactual explanation is a common class of methods to make local explanations of machine learning decisions. For a given instance, these methods aim to find the smallest modification of feature values that changes the predicted decision made by a machine learning model. One of the challenges of counterfactual explanation is the efficient generation of realistic counterfactuals. To address this challenge, we propose VCNet-Variational Counter Net-a model architecture that combines a predictor and a counterfactual generator that are jointly trained, for regression or classification tasks. VCNet is able to both generate predictions, and to generate counterfactual explanations without having to solve another minimisation problem. Our contribution is the generation of counterfactuals that are close to the distribution of the predicted class. This is done by learning a variational autoencoder conditionally to the output of the predictor in a join-training fashion. We present an empirical evaluation on tabular datasets and across several interpretability metrics. The results are competitive with the state-of-the-art method.
translated by 谷歌翻译
Foundation models are redefining how AI systems are built. Practitioners now follow a standard procedure to build their machine learning solutions: download a copy of a foundation model, and fine-tune it using some in-house data about the target task of interest. Consequently, the Internet is swarmed by a handful of foundation models fine-tuned on many diverse tasks. Yet, these individual fine-tunings often lack strong generalization and exist in isolation without benefiting from each other. In our opinion, this is a missed opportunity, as these specialized models contain diverse features. Based on this insight, we propose model recycling, a simple strategy that leverages multiple fine-tunings of the same foundation model on diverse auxiliary tasks, and repurposes them as rich and diverse initializations for the target task. Specifically, model recycling fine-tunes in parallel each specialized model on the target task, and then averages the weights of all target fine-tunings into a final model. Empirically, we show that model recycling maximizes model diversity by benefiting from diverse auxiliary tasks, and achieves a new state of the art on the reference DomainBed benchmark for out-of-distribution generalization. Looking forward, model recycling is a contribution to the emerging paradigm of updatable machine learning where, akin to open-source software development, the community collaborates to incrementally and reliably update machine learning models.
translated by 谷歌翻译